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Abstract 

This  paper  examines  a  class  of  proximal  minimization  algorithms  in  which  the  ob- 
jective function  of  the  underlying  convex  program  is  approximated  by  cutting  planes. 
This  class  includes  algorithms  such  as  cutting  plane,  cutting  plane  with  line  search  and 
bundle  methods.  Among  these  algorithms,  the  bundle  methods  can  be  viewed  as  a 
quadratic  counterpart  of  the  cutting  plane  algorithm  with  line  search,  for  they  both 
attempt  to  decrease  the  true  objective  function  at  every  iteration.  On  the  other  hand, 
the  cutting  plane  algorithm  does  not  explicitly  and/or  directly  attempt  to  decrease  the 
true  objective  function.  However,  it  relies  on  the  monotonicity  of  the  approximating 
function  to  guarantee  convergence  to  an  optimal  solution.  This  prompts  the  question  of 
whether  there  exists  a  quadratic  counterpart  for  the  cutting  plane  algorithm.  To  provide 
an  affirmative  answer,  this  paper  constructs  a  new  convergent  algorithm  which  resem- 
bles, but  different  from,  the  bundle  methods.  Also,  to  make  the  relationship  between 
bundle  methods  and  proximal  minimization  more  concrete,  this  paper  also  supplies  a 
convergence  proof  for  a  variant  of  the  bundle  methods  which  utilizes  analysis  common 
to  proximal  minimization. 
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1.  Introduction 

This  paper  proposes  an  application  of  the  proximal  minimization  algorithm  for 
the  following  problem. 

D  :  £(u*)  =  min£(u) 

where  C(u)  is  convex  and  U  is  a  compact  subset  of  Rm .  In  particular,  U  is  assumed 
to  be  a  polyhedral  of  the  form  {u  :  Au  <  b  and  u  G  Rm],  where  A  is  a  p  x  m 
and  b  is  a  vector  in  Rp.  To  simplify  the  presentation  and  motivate  applications 
to  Lagrangian  duality  and  variational  inequalities,  the  objective  function  C(u)  is 
also  assumed  to  have  the  following;  form: 


*o 


C(u)  =  max{f(x)  +  u-g(x)}  (1) 

r6  A 

where  X  is  a  compact  subset  of  Rn ,  f{x)  is  a  real- valued  function  on  Rn ,  and 
g(x)  is  a  vector-valued  function  mapping  Rn  to  i?m.  The  notation  a  •  b  denotes 
the  usual  dot  product  between  two  vectors,  a  and  b. 

When  U  is  taken  to  be  the  (noncompact)  set  {u  :  u  >  0  and  u  G  /?m},  D  is 
simply  the  Lagrangian  dual  problem  of  the  following  nonlinear  program: 

P  :  /(*•)  =  max  f(x) 

s.t.  g(x)    >    0 

x   e   x. 

Under  an  additional  assumption  that  there  exists  an  x  such  that  g(x)  >  0,  the 
solution  to  P  can  be  obtained  by  solving  D  with  U  =  {u  :  0  <  u  <  M  and  u  G 
Rm],  where  M  is  sufficiently  large. 

On  the  other  hand,  when  f(x)  =  F(x)  ■  x  and  g(x)  =  F(x),  where  F(x)  is  a 
continuous  mapping  from  Rm  into  itself  and  satisfies,  for  some  a  >  0, 

{F(u)  -  F{x))  -{u-x)>  a\\u  -  x||2,      Vu,x  G  U, 

then 

£(u)  =  max{—  F(x)  •  (x  —  u)} 


and  D  becomes 

minmax{  —  F(x)  •  (x  —  u)}  =  —  maxmin{F(x)  •  (x  —  u)}. 

Hearn  et  al.  (1984)  referred  to  the  problem  on  the  right  as  the  dual  of  the  formu- 
lation based  on  the  gap  function  for  the  following  variational  inequality: 

Find  u"  eU  such  that      F(u')  •  (x  -  um)  >  0,      V  x  G  U. 

For  the  remainder,  it  is  convenient  to  simply  refer  to  C(u)  as  the  dual  function. 

To  solve  Z),  the  proximal  minimization  algorithm  (see,  e.g.,  Bertsekas  and 
Tsitsiklis,  1990,  Martinet,  1970,  and  Rockafellar,  1976)  generates  a  sequence  of 
points  in  U  by  the  iteration 

ufc+1  =  argmin{£(u)  +  —  |{u  -  ufc|j2}  k=  1,2,...  (2) 

where  u1  is  a  starting  point,  ||  •  |j  denotes  the  Euclidean  norm  and  c^  is  a  sequence 
of  positive  numbers  with 

liminf  c^  >  0. 

Although  the  above  iterative  process  converges  to  an  optimal  solution  of  D,  there 
is  a  concern  regarding  its  practicality.  Bertsekas  and  Tsitsiklis  (1990)  pointed  out 
in  their  book  that  the  proximal  minimization  algorithm  requires  solutions  to  a 
sequence  of  problems  instead  of  just  one  problem.  When  C(u)  is  nondifferentiable, 
this  concern  is  more  acute.  Adding  the  'proximal'  term  y~\\u  —  u  ||2  only  makes 
the  objective  function  of  the  problem  in  (2)  strictly  convex.  So,  when  C(u)  is 
nondifferentiable,  the  objective  function  in  (2)  is  still  nondifferentiable  and  solving 
a  sequence  of  nondilTerentiable,  but  strictly  convex,  does  not  appear  as  attractive 
as  solving  only  one  nondifferentiable  problem  that  may  not  be  strictly  convex. 

To  make  proximal  minimization  more  amenable  to  Z),  this  paper  approximates 
C(u)  in  (2)  by  the  following  function: 

£(u)  %  Lk{u)  =   max  {f{x')  +  u  ■  g(x1)} 

i=l,...,k 

where  xx  E  X.  When  xx  is  chosen  appropriately,  Lk(u)  is  simple  a  maximum  of  a 
finite  number  of  hyperplanes  tangential  to  C(u).  These  hyperplanes  are  generally 
known  as  cuts  or  cutting  planes. 


To  unify  the  above  scheme  with  other  algorithms  that  use  cutting  planes,  this 
paper  describes  in  the  next  section  a  generic  algorithm  which  combines  cutting 
planes  with  proximal  minimization.  From  this  generic  algorithm,  three  algorithms 
from  the  literature  can  be  derived;  they  are  the  cutting  plane  algorithm,  the 
cutting  plane  algorithm  with  line  search  and  the  family  of  bundle  methods.  Among 
these  algorithms,  the  bundle  methods  can  be  viewed  as  a  quadratic  counterpart  of 
the  cutting  plane  algorithm  with  line  search  or  vice  versa,  i.e.,  the  latter  is  a  linear 
counterpart  for  the  former.  This  prompts  the  question  of  whether  there  exists  a 
quadratic  counterpart  for  the  (plain)  cutting  plane  algorithm.  The  results  in  this 
paper  provide  an  affirmative  response  to  the  question. 

For  the  remaining,  Section  2  formally  states  the  generic  algorithm  and  derives 
from  it  the  three  algorithms  in  the  literature.  Also  derived  is  the  new  algorithm 
which  is  a  quadratic  counterpart  of  the  cutting  plane  algorithm.  Section  3  pro- 
vides convergent  results  for  the  new  algorithm.  To  establish  a  closer  relationship 
between  proximal  minimization  and  bundle  methods,  Section  4  provides  a  conver- 
gence proof  for  a  simple  version  of  the  latter  which  is  different  from  those  in  the 
literature  and  uses  analysis  common  to  proximal  minimization.  Finally,  Section  5 
concludes  the  paper. 


2.  Classification  of  Algorithms 

To  classify  and  establish  relationships  among  algorithms,  we  first  state  a  generic 
algorithm  and  then  show  how  it  can  be  specialized  to  the  four  algorithms.  Three 
of  the  four  exist  in  the  literature  and  the  last  is  new  and  shown  to  be  a  quadratic 
counterpart  of  the  cutting  plane  algorithm. 

A  GENERIC  ALGORITHM 

Step  1:  Select  a1  G  U.  Set  k  =  l.r1  =  ul  and 

x1  =  arg  max{/(x)  -f  u1  •  g(x)}. 

r€-V 

Step  2:  Solve  the  master  problem 

uk+1  =  avgmm{Lk(u)  +  - — ||u  -  t>fc||2}. 

If  vk  also  solves  the  problem,  stop  and  vk  solves  D. 
Step  3:  Solve  the  subproblem 

xk+l  =argmax{/(z)+  uk+1  ■  g{x)} 

Note  that  C{uk+1)  =  f{xk+1)  +  uk+l  ■  g{xk+1). 

Step  4:  Derive  v  +1  G  U  from  uk+1  and  vk  using  some  process  and/or  criteria 
(see  discussion  below).  Set  k  =  &  +  1  and  return  to  Step  1. 


Note  that  the  (master)  problem  in  Step  2  is  slightly  different  from  the  one  in 
equation  (2)  of  the  previous  section.  The  'prox-center'  in  the  proximal  term  is  v 
for  the  master  problem  and  it  is  uk  for  the  problem  in  (2).  In  addition,  the  master 
problem  in  Step  2  can  be  stated  as 

MP:  min  w    +     — \\u  -  vk\\2 

s.t. 

w    •>     /(x')  +  u  "  9{x')       i  =  l,...,fc 
Au    <    b 


where  the  first  k  constraints  are  generally  referred  to  as  cuts  or  cutting  planes. 
The  dual  of  MP  can  be  written  as 

MD:        max-^IIGTT  +  ^AH2     +     (GV  +  /)  •  tt  +  (Avk  -  b)  •  A 

S.t.  J^TT,       =       1 

i=l 

7r,    >    0,      i  =  1, . . .  ,k 

\j     >    0      ;  =  l,...,p 

where  /  denotes  a  vector  in  R  with  f{xx)  as  its  components,  G  denotes  a  m  x  k 
matrix  with  g(xl)  as  its  columns,  tt,  are  the  dual  variables  corresponding  to  the 
cutting  plane  constraints  and  A;  are  dual  variables  corresponding  to  the  con- 
straints defined  by  the  matrix  A.  In  any  case,  both  the  master  problem  and  its 
dual  can  be  solved  in  a  finite  number  of  iterations.  Pang  (19S3)  and  Lin  and 
Pang  (1987)  reviewed  a  large  number  of  algorithms  applicable  to  both  MP  and 
MD.  More  specifically,  Kiwiel  (1991)  designed  a  dual  algorithm  to  solve  MP  and 
Bertsekas  (1982)  proposed  an  efficient  algorithm  designed  especially  for  convex 
programming  problems  with  simple  constraints  such  as  those  in  MD. 

Below,  we  describe  four  specializations  of  the  generic  algorithm.  They  are 
the  cutting  plane  algorithm,  the  cutting  plane  algorithm  with  line  searches,  the 
bundle  methods  and  a  new  algorithm  called  the  proximal  minimization  algorithm 
with  cutting  planes. 

The  cutting  plane  (CP)  algorithm:  The  generic  algorithm  reduces  to  the 
CP  algorithm  when  c^  =  oo  and  i/~+1  =  uk+l  V  k.  First,  setting  c^  =  cc  makes 
the  proximal  term  vanishes  from  the  objective  function  of  the  master  problem  in 
Step  2,  thereby  reducing  it  to  the  following  linear  program: 

ML  :  min  w 

s.t. 

w    ^     /(x')  +  u  •  g{xx)       1  =  1,...,/: 
Au     <     b 

Without  the  proximal  term  and  always  setting  vk+1  =  uk+l  ,  the  variable  v  be- 
comes superfluous  and  can  be  eliminated  from  the  algorithm  entirely.  This  reduces 
Step  4  to  simply  increment  k  by  one.   It  can  be  shown  that  the  stopping  rule  in 


Step  2  of  the  generic  algorithm  is  equivalent  to  the  one  typical  for  the  CP  algo- 
rithm which  is  to  stop  when  C(u  +1)  =  Lk(uk+l)  in  Step  3. 

The  CP  algorithm  was  first  introduced  by  Cheney  and  Goldstein  (1959)  and 
Kelly  (1960).  Dantzig  and  Wolfe  (1960)  developed  a  related  algorithm  called  the 
column  generation  technique  in  the  context  of  decomposing  large  scale  linear  pro- 
grams. Column  generation  was  later  generalized  to  solve  Lagrangian  dual  prob- 
lems for  mathematical  programs  (see,  Dantzig,  1963  and  Magnanti  et  al.,1976) 
and  was  given  the  name  generalized  linear  programming  technique.  Regardless  of 
the  terminology,  it  is  well  known  (see,  e.g.,  Dantzig,  1963,  Magnanti  et  al.,  1976 
and  Zangwill,  1969)  that  the  convergence  of  the  CP  algorithm  follows  from  the 
monotonicity  of  the  sequence  {wk}  or  {Lh~1(uk)}.  However,  the  corresponding 
sequence  of  dual  function  values,  {£(uk)}  is  not  necessarily  monotonic.  Therefore, 
the  CP  algorithm  is  a  variant  of  the  generic  algorithm  which  has  a  linear  master 
problem  and  does  not  attempt  to  descend  the  dual  function. 

The  cutting  plane  algorithm  with  Line  Search  (CPLS):  In  an  effort  to 
force  the  CP  algorithm  to  descent  the  dual  function,  Hearn  and  Lawphongpanich 
(1989b  &  1990)  added  a  line  search  step.  CPLS  can  be  obtained  from  the  generic 
algorithm  by  setting  c*.  =  oc  for  all  k  and,  in  Step  3,  letting 


v 


!=arg     min    {£{vk  +  \{uk+l  -  vk))} 


0<A<Aup 


where  Aup  =  max{A  :  vk  +  A(ufc+1  —  vk)  £  U}.  Thus,  vK+l  minimizes  C{u)  along 
the  direction  dk  —  uk+1  —  v  .  Hearn  and  Lawphongpanich  (1989a)  showed  that, 
if  C(u)  is  differentiate  at  vk .  then  dk  is  a  descent  direction  and  C(vk+l)  <  C(vk). 
Therefore,  CPLS  is  a  variant  of  the  generic  algorithm  which  has  a  linear  master 
problem  and  attempts  to  descend  the  dual  function,  i.e.,  a  descent  is  guaranteed 
whenever  the  dual  function  is  differentiable  at  the  current  iterate,  vk. 

The  bundle  methods:  As  in  CPLS,  the  main  thrust  of  the  bundle  methods, 
first  introduced  by  Lemarechal  (1974,  1975)  and  Wolfe  (1975),  is  to  generate  a 
monotonic  sequence  of  dual  function  values.  From  the  generic  algorithm,  one  can 
obtain  a  version  of  the  bundle  methods  by  setting  c^  <  oc  for  all  k  and,  in  Step 
3,  letting 

vk+l  _    f  uk+l     if  £{uk+l)  +  m(Lk{vk)  -  Lk{uk+'))  <  £{vk) 
vk         otherwise 


where  m  £  (0,1).  Other  methods  for  determining  vk+1  exist  and  they  can  be 
found  in,  e.g.,  Auslender  (1987),  Fukushima  (1984),  Gaudioso  and  Monaco  (1982), 
Kiwiel  (1985  k  1989),  Lemarechal  (1989)  and  Mifflin  (1977).  Also,  note  that 
updating  vk+1  is  in  essence  choosing  the  prox-center  for  the  next  iteration. 

Several  authors  (e.g.,  Fukushima,  1984,  Kiwiel,  1989  and  Lemarechal,  1991) 
have  observed  the  similarity  between  proximal  minimization  and  bundle  methods. 
However,  it  is  interesting  that  the  developments  of  the  two  types  of  algorithms 
appear  different.  In  an  effort  to  unify  the  development  of  bundle  methods  and 
proximal  minimization,  Section  4  provides  a  convergence  proof  for  a  simple  method 
for  updating  v k+1  which  is  different  from,  but  related  to,  the  one  shown  above. 

When  vk+l  =  uk+l ,  the  kth  iteration  is  called  a  'serious1  step.  Otherwise 
(i.e.,  vk+1  =  vk),  it  is  called  a  'null'  step.  So,  after  every  serious  step,  the  dual 
function  decreases  and  bundle  methods  change  the  prox-center.  Since  c^  <  oo,  the 
master  problem  for  the  bundle  methods  is  quadratic  (see  problem  MP  or  MD). 
Therefore,  any  bundle  method  can  be  considered  as  a  quadratic  counterpart  of 
CPLS  since  it  has  a  quadratic  master  problem  and  attempts  to  descend  the  dual 
function,  in  that  it  decreases  the  dual  function  at  every  serious  step.  To  emphasize 
the  fact  that  bundle  methods  are  variants  of  the  generic  algorithm,  we  also  refer 
to  them  as  proximal  minimization  algorithms  with  subgradient  bundles  (PMSBj. 

A  Proximal  minimization  with  cutting  planes  (PMCP):  Setting  c^  < 
oo  and  always  letting  vk+l  =  uk+1  in  Step  3  produces  a  variant  of  the  generic 
algorithm  which  has  a  quadratic  master  problem  and  does  not  attempt  to  descend 
the  dual  function.  Note  the  PMCP  is  similar  to  the  bundle  methods  because 
both  have  a  quadratic  master  problem;  however,  it  is  different  because  it  changes 
the  prox-center  after  every  iteration  instead  of  after  a  serious  iteration.  In  the 
framework  of  the  generic  algorithm,  PMCP  is  a  quadratic  counterpart  of  the 
cutting  plane  algorithm,  for  they  both  do  not  attempt  to  descend  the  dual  function 
and  one  has  a  linear  master  problem  and  the  other,  quadratic. 

As  mentioned  earlier,  the  convergence  of  the  CP  algorithm  does  not  require 
any  monotonicity  of  the  dual  function  values.  On  one  hand,  it  is  curious  that 
an  algorithm  can  converge  without  any  attempt  to  decrease  the  dual  function 
directly.  On  the  other  hand,  the  convergence  of  the  CP  algorithm  confirms  that 
decreases  in  the  cutting  plane  approximating  function  sufficiently  insures  that  the 
dual  function  eventually  converges  (not  necessarily  in  a  monotonic  manner)  to 


the  optimal  value.   The  convergence  proof  for  PMCP  in  the  next  section  further 
corroborates  this  hypothesis. 

Table  1  below  summarizes  the  relationships  among  the  four  algorithms  which 
use  cutting  planes  to  approximate  the  objective  function.  Recall  that  the  phrase 
'attempt  to  descend  the  dual  function''  is  to  indicate  that,  although  none  of  the 
four  algorithms  guarantees  a  decrease  in  the  dual  function  at  every  iteration, 
some  make  an  attempt  to  decrease  the  function  in  each  one.  In  particular,  the 
bundle  methods  only  yield  a  decrease  at  every  serious  step  and  CPLS  yields  one 
whenever  the  dual  function  is  differentiable.  Nevertheless,  all  is  proven  to  converge 
to  a  solution  of  D. 


Master  Problem 


Linear  (ck  =  oo) 


Attempt  to  Descend  the  Dual  Function 


Y 


es 


CPLS 


Quadratic  (ck  <  oc)  j  Bundle  methods  or  PMSB 


No 


CP 


PMCP 


Table  1:  Classes  of  algorithms  which  use  cutting  planes 
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3.  Convergence  of  PMCP 

Below,  we  restate  more  concisely  the  generic  algorithm  as  specialized  to  the  prox- 
imal minimization  algorithm  with  cutting  planes. 

A  PROXIMAL  MINIMIZATION  ALGORITHM 
WITH  CUTTING  PLANES  (PMCP) 

Step  1:  Select  ul  G  U.  Set  k  =  1  and 

xl  =  arg  max{/(i)  +  u1  •  g(x)}. 

Step  2:  Solve  the  master  problem 

uk+1  =  argmin{L*(u)  +  —\\u  -  uk\\2}. 

"€(/  2Cfc 

If  uk+l  =  uk ,  stop  and  uk  is  an  optimal  solution. 
Step  3:  Solve  the  subproblem 

xk+1  =  argmax{/(r)  +  uk+l  ■  g{x)} 

x€A' 

Increment  k  by  1  and  go  to  Step  1. 


First,  note  that  since  vk+l  always  equals  to  uk+l  the  variable  v  is  not  needed 
and  has  been  eliminated  from  the  above  algorithm.  Then,  recall  that  in  Step  2 
L  (u)  is  convex  and  defined  previously  as 

Lk{u)  =    max  {/(x1)  +  u  ■  g{x1)}. 

i=l,...,k 

In  Step  3,  x^"1"1  satisfies 

L*+1(ufe+1)  =  f{xk+1)  +  uk+1  ■  9{*k+1)  =  £(uk+1).  (3) 

The  theorem  below  validates  the  stopping  rule  in  Step  2. 
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Theorem  1.   If  uk+1  =  uk ,  then  uk+l  is  an  optimal  solution  to  problem.  D. 

Proof:   Consider  the  cutting  plane  representation  of  the  master  problem  at  the 
kth  iteration. 

min  w     +     II u  —  u  ||2 

s.t. 

w     >     f(It)  +  u■g(x,),       1  =  1,...,* 
An     <     b 

Then,  (wk+1,  uk+i ),  where  u^""1"1  =  Lk(vk+l),  is  an  optimal  solution.  Since  uk+1  = 
uk ,  it  follows  from  (3)  that 

wk+1  =  Lk{uk+l)  =  Lk{uk)  =  C(uk). 

In  addition,  the  KKT  conditions  are  necessary  at  (iufc+1,  uk)  and  there  must 
exist  vector  7r  and  A  satisfying  the  following  equations: 

£>(*•>•■  +  £  fl'"Ai   =  o 

S>  =  i 

-,  Xj     >    0      V?  €  /'  and  j  G  J' 

where 

aJ     =     the  j'    row  of  matrix  A, 

/'     =     {i:  wk+l  =  /(x!)  +  uA"-5(x')  for  i  =  l,...,fc}  and 

J'     =     {j  :  a;  •  ufc  -  6;  for  ;'  =  1 ,/>}. 

Since  u;fc+1  =  £(ufc),    <7(x'),  V  i  6  /',  are  subgradients  of  C(uk)  and 

H{g{xl):ie  I')  C8C{uk) 

where  H(-)  denotes  a  convex  hull.  Thus,  the  KKT  conditions  can  be  written  more 
compactly  as 

OG  dL(uk)+  J2a3XJ- 
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However,  this  is  the  KKT  condition  for  problem  D.  Since  C(u)  is  convex  and  U 
is  a  polyhedron,  the  condition  is  sufficient  and  the  proof  is  complete.  □ 

By  the  above  theorem,  if  PMCP  stops  after  a  finite  number  of  iterations,  it 
must  stop  at  an  optimal  solution.  When  PMCP  generates  an  infinite  sequence, 
it  is  sufficient  to  show  that  PMCP  converges  to  an  optimal  solution  for  the  case: 
Cjt  =  c  >  0  Vfc.  (This  is  true  because  of  the  assumption  that  lim  inf ^oo  Ck  >  0.) 
To  do  so,  define  the  following: 

X°°   =  {x1,  x2,  x3, . . .},  i.e.,  the  set  of  xl  generated  by  Steps  1  and  3  of  PMCP. 
[X°°]  =  the  closure  of  X°°.  Note  that  [X°°]  C  X. 
L°°(u)  =maxr€[A'«]{/(i)  +  w5(i)}. 

From  the  above  description,  it  is  clear  that 

Lk{u)  <  L°°(u)  <£{u)  for  fc=  1,2,... 

where  the  first  inequality  follows  from  the  fact  that  {x*  :  i  =  l,...,fc}  C  [.Y°°] 
and  the  second  inequality  from  the  fact  that  [X00]  C  X.  Observe  also  that  for 
any  k 

Lk+j(uk)  =  C(uk)  =  f(xk)  +  uk  ■  g(xk)      V  j  =  0, 1,  2, . . .  (4) 

Similarly,  since  xk  £  [-V°°],  the  following  must  hold 

L°°(uk)  =  £{uk)     Vk<oo.  (5) 

Moreover,  {L  (u)}k  is  a  sequence  of  continuous  convex  function  which  converges 
pointwise  to  L°°(u).  However,  since  {Lk(u)}k  is  also  monotonic,  it  must  also 
converge  uniformly  to  L°°{u)  (see,  Theorem  7.13  in  Rudin,  1976). 

To  prove  convergence  and  obtain  a  solution  to  .D,  define  a  sequence  {zk}k  as 
follows:  let  z1  =  £{ul)  and  for  k  =  1,2,3,.. .  let 

*+i  ..  /  £(uk+l)    i(  C{uk+l)  +  f\\uk+l  -  uk\\2  <  zk 


z         = 


;k  oth 


erwise 


where  m  €  (0,1).   Also,  we  have  from  (4)  that  £{uk+l)  =  Lk+1{uk+l).   So,  com- 
puting zk  requires  no  extra  effort.  Next,  construct  an  index  set  K  as  follows 

£=  {k:zk+l  =  C{uk+1).} 
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In  words,  K,  is  the  index  set  of  iterations  in  which  there  is  a  sufficient  decrease  in 
the  dual  function,  i.e.,  by  an  amount  ^||i/+1  —  uk\\2.  The  next  two  results  address 
the  convergence  of  PMCP  which  K,  is  an  infinite  set. 

Lemma  2.  Let  K  be  an  infinite  set.  If  a  subsequence  {uk}ke^  converges  to  u°° 
for  some  K  C  K,  then  {uk+1}keK  ^so  converges  to  u°° . 

Proof:  Consider  the  sequence  {zk}k-  By  definition,  it  is  a  nonincreasing  sequence 
which  is  bounded  below  by  £(u").  Thus,  {z  }k  must  converge.  Since  K  C  AC,  the 
following  must  hold  for  all  k  E  K 

^+i  +  Ui|u*+i.ut||2   <   zk 


2  c 

—  \\u" '  '  —  u 
2c" 


fc||2        <        -k  —    £*+* 


Taking  the  limit  as  k  — ►  oo  and  k  E  A'  yields  that 

.     Um-\\uk+1-uk\\2  =  0. 

kei<  2c 

Since  both  m  and  c  are  positive,  {u     1}keK  and  {uk}keK  must  have  a  common 
limit  point,  u°°.  □ 

Theorem  3.   IfK.  is  an  infinite  set,  then  every  limit  point  of  the  sequence  {uk}ke)c 
is  a  solution  to  D. 

Proof:   Let  u"  be  a  solution  to  D  and  lim^eA'  it    —  u°°  for  some  K  C  AC.    Since 
uk+l  solves  the  master  problem  in  Step  2,  the  following  must  hold 

Lk(uk+1)  +  j-c\\uk+l-uk\\2  <  Lk(u)  +  j-c\\u-uk\\2  Vt,  eUkk.  (6) 

For  any  a  G  (0,1),  setting  u  =  au"  +  (1  —  a)u  +1  in  (6)  gives 
Lk(uk+1)  +  j-c\\uk+1-uk\\2    <    Lk(au*  +  {l-a)uk+1)  + 

±\\au'  +  (l  -  a)uk+1  -  uk\\2 
Lk(uk+1)  +  ±\\uk+1  -  uk\\2     <    QLk(u')  +  {l-a)Lk{uk+l)  + 

l||a(u"  -  uk)  +  (1  -  a)(ufc+1  -  u*)||2 

lc 

Lk{uk+1)  +  j-c\\uk+1-uk\\2     <    aLk(u')  +  (\-a)Lk{uk+1)  + 

h\\a(u*-uk)\\  +  \\(l-a)(uk+1-uk)\\)i 

>  r 


lc 
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aLk{uk+l)     <    aLk{u')-j-c\\uk+l-uk\\2  + 

JL(\\a(u*-uk)\\  +  \\(l-a)(uk^-uk)\\y 
Zc 

aLk{uk+1)     <    a£(um)-j-c\\uk+1  -uk\\2  + 

l(||a(u--^)||  +  ||(l-a)(ufc+1-^)||)2  (7) 

Zc 

where  the  second  inequality  follows  from  convexity  of  Lk(u),  the  third  from  tri- 
angular inequality  and  the  last  from  the  fact  that  Lk(um)  <  £(u*).  Since  L3{u)  is 
continuous  for  all  j  —  1,2,...  and,  from  Lemma  2,  ||tifc+1  —  uk\\  — *  0  for  k  G  K, 
there  must  exist,  for  any  e  >  0,  a  sufficiently  large  k\  such  that  for  any  j 

\L\uk+l)  -  LJ{uk)\  <  c,       V  k  G  K  and  k  >  ku  or, 

Lj(uk)-e  <  LJ(uk+1)  <  Lj(uk)  +  e,      V  k  G  A' and  k  >  kx. 
Setting  j  =  k  and  using  (3),  i.e.,  Lk(uk)  =  C(uk),  yield  the  following 

C(uk)  -  e  <  Lk{uk+l)  <  £{uk)  +  £,      V  k  G  A  and  k  >  kx. 

Combine  the  left  inequality  with  (7)  to  obtain  that 

a(C{uk)-c)  < 

a£{u')-l\\uk+l  -ufc||2  + 

^-(||a(u*  -  uk)\\  +  ||(1  -  a)(uk+1  -  uk)\\)\    Vke  K  and  k  >  k, 
Zc 

Take  the  limit  as  k  — >  oo  and  k  G  A'  and  obtain 


a(£{u°°)-e)     <     a£(u')  +  ^\\a(u'-u™)f~ 

2  c 

C(u°°)-e    <    £(V)  +  —  Wu'-u^W2 

2  c 

C(u°°)-£(u*)    <    ^\\(u' -  u™)\\2  +  e.  (8) 

Zc 

Since  (S)  holds  for  any  a  G  (0, 1)  and  £  can  be  chosen  arbitrarily  small,  it  must 
be  true  that 

C(u°°)-C{um)  =  0, 

Thus.  u°°  is  a  solution  to  D.D 
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An  immediate  consequence  of  Theorem  3  is  that  the  entire  sequence  {uk}keic 
converges  to  the  optimal  solution  when  D  has  a  unique  solution  (see,  e.g.,  Bazaraa 
and  Shetty,  1979). 

Consider  now  the  case  when  £  is  finite.  Define  t  =  max{fc  :  k  £  JC)  +  1.  Then, 

zh  =  zl      V  k  >  £,  and 

£("k+l)  +  ^ll^+1  -  ""II2  >  **  =  **■      Vfe  >  /  (9) 

2c 

Lemma  4.   Let  AC  be  a  finite  set  and  i  be  as  defined  above.  Then, 

liminf||u*+1  -  uk\\2  =  0. 

Proof:  Assume  otherwise,  i.e.,  there  exists  a  8  >  0  such  that 

liminf  ||u*+1  -  uk\\2  >  6.  (10) 

k>£       ii  n      —  v        / 

In  other  words,  for  a  sufficiently  large  k\  >  £, 

\\uk+1  -uk\\2>-6,      Vk>k1  (11) 

From  Theorem  3,  setting  u  =  uk  in  (6)  produces  the  following 

Lk(uk+1)  +  ±\\uk+1  -  uh\\2  <  Lk(uk)        V  k.  (12) 


Since  {L  (u)}k  converges  uniformly  to  L°°(u),  there  must  exist  for  every  s  6  (0,  |) 
a  sufficient lv  large  k7  such  that 


\Lk{u)  -  Lx{u)\  <—      V  k>  k2  and  u  e  U,  or 
4c 

^(u)  -  —  <  Lk(u)  <  L°°(u)  +  —      Vk>  k2  and  u€l/.  (13) 

4c  4c 

Combining  (12)  and  (13)  yields 

£~(tt*+i)_i.  +  j.||u*+i_u*|ia    <    L°°(ufc)  +  —     Vfc>Jk2 

4c       Jc  4c 


IC 


L~(ufc+1)  +  -HK+1-uT-£}   <   l°°(u*)    vit>Jtj 
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Using  (5),  we  must  have  that 

C{uk+1)  +  ^{||^+1  -  uk\\2  -e}<  C{uk)      Vk  >  k2  (14) 

However,  (11)  and  (14)  imply  that  the  subsequence  {C(uk)}k>k-,  where  k  — 
max(A;i,  k2),  is  a  monotonically  decreasing  sequence  and  bounded  below  by  C(u"). 
Therefore,  {C(uk)} k>j.  must  converge  and 

lim^-{\\uk+1-uk\\2-e}    =    lim(£(u*+1)-£(u*))  =  0 

k>k  ZC  k>k 

lim||ufc+1-t/-|j2     =     e. 

k>k 

Since  e  can  be  chosen  arbitrarily  small,  this  contradicts  (10).  □ 

The  above  lemma  implies  that  there  exists  a  K  C  [k  :  k  >  £}  such  that 

lirn||ufc+1-^||2  =  0. 

fcfcA 

Since  U  is  compact  and  uk  G  U  for  all  k,  there  must  also  exists  a  K'  C  K  such 
that  {uk)k£K'  converges  to,  say,  u°°.  As  a  consequence,  {uk+l}keK'  must  converge 
to  u°°  as  well. 


Theorem  5.   UK  is  finite,  then  uL  is  a  solution  to  D. 

Proof:   Based  on  the  preceding  discussion,  there  must  exist  a  K  C  {k  :  k  >  £} 
such  that  the  following  conditions  hold 

1.  limjbeJ^  Hi***1  — ■  «*||a  =  0. 

2.  {uk}k€K  -  u°°. 

3.  {uk+1}k€K  -*  u<*>. 

From  Theorem  3,  setting  u  =  qu'  +  (1  —  a)uk+1  in  (6)  for  any  a  G  (0, 1)  gives 

Lk{uk+1)  +  ±\\uk+1  -  uk\\2     <     Lk{au'  +  (l-a)uk+1)  + 

±\\au'  +  (1  -  a)uk+1  -  uk\\\ 
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Using  the  same  argument  as  in  Theorem  3  with  the  index  set  K ,  it  can  be  shown 
that 

C(u°°)  =  £(«•).  (15) 

Similarly,  setting  u  =  auf  +  (1  —  a)uk+1  in  (6)  for  any  a£  (0, 1)  gives 

L*(u*+1)  +  i||tt*+1-tt*Ha     <     Lk{aue  +  (l-a)uk+1)  + 

i-||au<  +  (l-a)u*+1-u*{|2, 

and  by  the  same  reasoning  it  must  follow  that 

£(u°°)  -  £(t/)  <  0  or  C{u°°)  <  £{ue).  (16) 


£(ix*+1)  +  ^||u*+1-u*||2>^.      \/k>t 


However,  from  (9)  it  is  true  that 

m 

Yc 

Take  the  limit  as  k  — *  oo  and  k  £  A'  and  invoke  the  continuity  of  C(u)  to  obtain 
that 

C{u°°)  >ze  =  C{ue)  (17) 

Combining  (15),  (16)  and  (17)  yields 

Ciu00)  =  jC(u*)  =  C{ue). 
So,  u£  must  be  a  solution  to  D.O 

In  addition  to  the  above  convergence  results,  if  f(x)  and  g(x)  are  linear  func- 
tions and  X  is  a  bounded  polyhedral,  then  xk+l  in  Step  3  can  be  restricted  to 
extreme  points  of  X,  for  which  there  are  finitely  many.  In  which  case,  there  must 
exist  a  sufficiently  large  £  such  that  Lk(u)  =  L((u)  V  k  >  t  and  u  G  U.  How- 
ever, this  implies  that  after  t  iterations  PMCP  reduces  to  the  application  of  the 
proximal  minimization  algorithm  to  the  following  linear  program: 

min  w 
s.t. 

w    >    fix1)  +  u  •  g(xi)       t  =  l....  J 
Au    <    b 

Then,  it  follows  from  Exercise  4.3  in  Bertsekas  and  Tsitsiklis  (19S9)  that  PMCP 
terminates  finitely  when  D  is  the  dual  of  a  linear  program,  or  equivalently,  £(u) 
is  piecewise  linear. 

IS 


4.  A  Bundle  Method 

Below,  we  describe  a  particular  variant  of  the  bundle  methods  which  uses  a  dif- 
ferent scheme  for  updating  vk+l  in  Step  4  of  the  generic  algorithm.  For  later 
reference,  we  call  this  variant  a  proximal  minimization  algorithm  with  subgradient 
bundles  (PMSB).  One  intention  of  this  section  is  to  present  a  convergent  proof  for 
PMSB  which  uses  analysis  similar  to  that  of  PMCP,  thereby  making  the  relation- 
ship between  bundle  methods  and  proximal  minimization  more  concrete.  Also,  it 
should  be  noted  that  some  variants  of  the  bundle  methods  require  a  line  search 
step  (see,  e.g.,  Fukushima,  19S4,  Gaudioso  and  Monaco,  1982  and  Kiwiel,  1985). 
However,  PMSB  as  stated  below  does  not  require  any  line  search. 

A  PROXIMAL  MINIMIZATION  ALGORITHM 
WITH  SUBGRADIENT  BUNDLES  (PMSB) 

Step  1:  Select  u1  G  U  and  m  such  that  0  <  m  <  1.  Set  k  =  \,vl  —  u1  and 

x1  =  argmax{/(:r)  +  u1  •  g{x)}. 

Step  2:  Solve  the  master  problem 

uk+1  =  argmin{L*(u)  +  -?-||ti  -  ufc||2}. 

"St-  2ck 

if  uk+1  =  v  ,  stop  and  vk  is  an  optimal  solution. 
Step  3:  Solve  the  subproblem 

xk^1  =  arg  max{/(x)  +  uk+1  ■  g{x)} 

Note  that  ^(u^1)  =  f{xk+1)  +  uk+l  ■  g{xk+1). 
Step  4:  Set 

ufc+i  =  f  ^+1    if  C(uk+*)  +  g-Hu^1  -  ^||2  <  C(vk) 
vk        otherwise 
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Recall  that,  when  vk+1  =  uk+1 ,  iteration  k  is  called  a  'serious'  step  or  iteration. 
Otherwise,  (vk+l  =  vk),  it  is  called  a  'null'  step.  In  addition,  the  updating  formula 
for  vk+1  in  Step  4  is  also  related  to  the  one  in  Section  2  (see  also  Lemarechal,  1991) 
which  is 

yk+1  =  i  uk^    if  C(uk+l)  +  m(Lk(vk)  -  Lk(uk^))  <  C(vk) 
)    vk         otherwise 

To  obtain  the  relationship,  observe  that  since  uk+1  is  a  solution  to  the  master 
problem 

Lk(uk+1)  + ^\\uk+1  -  vk\\'2  <  Lk{vk) 

J_n    fc+l_v*||2  <  l*(w*)  -  L*(u*+1) 

m^\\uk+l  -vk\\2  <  m{Lk(vk)-Lk{uk+1)) 

C(uk+1)  +  m£-\\uk+1-vk\\2  <  C{uk+1)  +  m(Lk(vk)-Lk(uk+1)) 

^C  k 

So,  the  updating  formula  .(19)  implies  (IS). 

When  PMSB  terminates  finitely,  Theorem  1  in  the  previous  section  still  guar- 
antees that  vk  is  an  optimal  solution  to  D.  Below  are  convergence  results  for 
the  case  when  the  algorithm  generates  an  infinite  sequence.  As  in  Section  3,  it  is 
assumed  without  loss  of  generality  that  c*  =  c  >  0.  V  fc,  and  let 

JC  =  {k:vk+l  =uk+1}. 

So,  AC  is  the  index  set  for  the  serious  steps  (iterations). 

Lemma  6.  Let  AC  be  an  infinite  set.  If  a  subsequence  {v  }keK  converges  to  v°° 
where  K  C  AC,  then  {fK+1}keh~  a^°  converges  to  v    . 

Proof:  Note  that  the  sequence  {£(r/c)}^.  is  a  nonincreasing  sequence  which  is 
bounded  below  by  £(u").  So,  {£(?/"  )}*.  must  converge.  Since  K  C  AC,  the  following 
must  hold 

£(vk+l)  +  ^\\vk+l  -  vk\\2  <  C{vk)    v  k  e  K 

2  c 
Following  the  same  argument  in  Lemma  2,  it  can  be  shown  that 

^e/\  lc 
Since  both  c  and  m  are  positive,  {v       }k£l<  must  converge  to  v°°.D 
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Theorem  7.  If  the  cardinality  of  K  is  infinite,   then  every  limit  point  of  the 
sequence  {vk}kefc  is  a  solution  to  D. 

Proof:   Let  {vk}keK  be  a  convergent  subsequence  where  K  C  AC.    Since  vk+1  = 
uk+1  V  k  G  K  and  uk+1  is  optimal  to  the  master  problem,  the  following  must  hold 

Lk(vk+l)  +  £||i>*+1  -  vk\\2  <  Lk(u)  +  j-Ju  -  vk\\2,      VueUkkeK. 

Using  the  same  analysis  as  in  Theorem  3  and  the  result  for  Lemma  6,  it  can  be 
shown  that  {vk}keK  converges  to  u'.O 

When  the  cardinality  of  K  is  finite,  define  as  before  i  =  max{A:  :  k  £  /C}  4-  1. 
So,  every  iteration  k  >  i  must  be  a  null  step  and  the  master  problem  in  Step  2 
must  have  the  form: 


Next,  let 


uk+1  =  argmax{Zfc(u)  +  ^||u-u'||2},        V  k  >    . 

Fk(u)     =     Lk(u)  +  £||u-i/||2,  and 
F~(u)    =    L~{u)  +  Mu-vt\\\ 


Then,  Fk(u)  epi-converges  to  F^iu)  since  Lk(u)  pointwise  and  monotonically 
converges  to  L°°(u).  (For  the  definition  and  properties  of  epi-convergence,  see, 
e.g.,  the  appendix  in  Wets,  1989.)  In  addition,  it  follows  from  Theorem  A. 2  of 
Wets(1989)  that  if  {uk+1}keK  -»  u°°  for  some  K  C  {1,2,3,...},  then 

u^  =  argmin  F°°(u). 

Furthermore,  since  Lk(-)  also  uniformly  converges  to  L^{-),  there  must  exists, 
for  any  e  >  0,  a  sufficiently  large  ki  such  that 

\Lk{u)  -  L^iu)]  <e      V  u  e  U  and  k  >  ku 

and  by  setting  u  =  uk+1 

\Lk{uk+1)-  L™{uk+l)\    <    e     yk>k, 
\Lk{uk+l)-C(uk+1)\    <    e     yk>k, 
£(uk+1)-e<Lk{uk+l)    <    C{uk+1)  +  e     \/k>kx  (20) 

where  the  middle  inequality  follows  from  (5). 
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Theorem  8.   If  the  cardinality  of  K.  is  finite,  then  vl  is  a  solution  to  D. 


Proof:  Since  U  is  a  compact  set,  there  must  exist  a  set  K  C  {k  :  k  >  £}  such 
that  {uk+1}k£K  converges  to  u°°. 

If  u°°   =   ve,  then  the  above  observation  concerning  the  epi-convergence  of 
Fk(u)  implies  that 

C(v')    =    L~(v<) 
=    Z,00^00) 

=     min{I0C(u)  + y:||u-^||2} 

<  nun{£(u)  +  £||u-t/||2} 

<  C(v<) 

where  the  first  equation  follows  from  (5),  the  third  from  the  observation  that 
u°°  =  argminu6t:F°°(u),  the  fourth  from  the  fact  that  L°°(u)  <  £(u)  V  u  G  U, 
and  the  last  from  the  fact  that  v    is  an  element  of  U.  Thus, 

£(v()  =  mm{C(u)  +  j-c\\u-vT}. 

However,  this  implies  that  v*  solves  D. 

Assume  that  u°°   ^   v  .     Let  6  =    \\u'x  —  ve\\2.     Then,  there  must  exists  a 
sufficiently  large  k2  such  that 

c 

||u*+i  _  ^j|2  >  _     v  k  >  k2  k  k  G  K  (21) 

However,  since  uk+l  solves  the  master  problem  in  Step  2  with  i/  as  its  prox-center, 
it  must  be  true  that 

Lk(uk+1)  +  l\\uk+1  -  v*\\2  <  Lk{v<)  =  C{v%      Vk>i, 

where  the  equality  follows  from  (4)  in  Section  3.  For  any  £  >  0,  let  k\  be  as  in 
(20)  so  that 

C(uk+l)  -  £  +  £||u*+1  -  r'||2  <  C(ve),      Wk>  max(^,  kx). 
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Set  e  —  * — 7^-  and  obtain 


4  c 


£(uk+l)  ~  ^-(ll^+1  "  "T  "  (1  "  ™d)  <  C(v%      V  k  >  max(£,  kx). 
Ac  I 

Then,  for  any  k  >  max(£,  ki,k2)  and  k  £  K,  (21)  implies  that 
C(v')    >    £(„*«)  + i(||u*«-„'f-(l-m)£) 

>  C(u^)  +  ^(\\uM-v'f-(l-m)\\u^-vr) 

>  £(ut+1)  +  ?||ut+1-f'f 

Ic 

However,  this  implies  that  there  must  be  a  serious  step  after  iteration  t  which  is 
a  contradiction.  Thus,  every  convergent  subsequence  of  {uk+1}k>t  converges  to 
ve  which  is  a  solution  to  D.  However,  this  implies  that  the  sequence  {uk+1}k>£ 
converges  to  v£  as  well.  □ 
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5.  Conclusion 

This  paper  presents  a  generic  algorithm  in  the  framework  of  proximal  minimiza- 
tion. It  is  shown  that  this  generic  algorithm  can  be  specialized  to  four  different 
algorithms;  they  are  the  cutting  plane  algorithm,  the  cutting  plane  algorithm 
with  line  searches,  the  bundle  methods  (or  proximal  minimization  with  subgra- 
dient  bundles,  PMSB)  and  proximal  minimization  with  cutting  planes  (PMCP). 
The  first  three  can  be  found  in  the  current  literature;  however,  the  last  one  is  new. 

Besides  the  obvious  relationship  that  all  four  algorithms  can  be  derived  from 
the  generic  algorithm,  other  relationships  based  on  the  master  problem  and  con- 
vergence behavior  are  also  established.  Convergence  proofs  for  PMSB  and  PMCP 
are  also  given. 
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